Re: [SQL] MINUS and slow 'not in'
От | Herouth Maoz |
---|---|
Тема | Re: [SQL] MINUS and slow 'not in' |
Дата | |
Msg-id | l03110702b28025746cbe@[147.233.159.109] обсуждение исходный текст |
Ответ на | MINUS and slow 'not in' (pierre <pierre@desertmoon.com>) |
Ответы |
Re: [SQL] MINUS and slow 'not in'
|
Список | pgsql-sql |
At 6:53 +0200 on 24/11/98, pierre wrote: > I then tried using a 'not in' clause. > > select * from A where user_id not in (select * from B); > > This is VERY slow, and examining the explain output tells me that it will > use the user_id index for table B, but a sequential scan of A even though > A has an index for the user_id column. First, I assume you meant "select user_id from B", not "select *", or something is very strange here. Anyway, suppose you had two tables. How would you go about doing this while using *both* indices? I don't think it's possible. You have a condition that says: include each row which doesn't meet a certain criteria. The only way to do it is to scan each row, get the value of its user_id, and then go to be, and use its index to find if the user_id we already have is NOT there. You can use an index only when you have a specific value to search for. A NOT IN clause doesn't supply a specific value, so you can't use the outer index. You may try to convert the NOT IN to a NOT EXISTS clause, and see if it improves anything, but it will still require a sequential search. If I needed this query often, I'd try to optimize it by adding a column to table A, marking the records that match, and then selecting all the records which don't match. I'm not sure whether one can index a boolean field in the current version of PostgreSQL, but if not, you can probably use a char field instead. I suppose you can make sure this column stays up-to-date with rules, or do the update as a preparatory step: Update all to "N", then update all the fields that match to "Y" with a join. VACUUM, ANALYZE, and then you can start selecting. This requires two sequential scans plus vacuums before you start selecting, so it may not be worth it if you only select once by this criteria... I'd go with the NOT IN or NOT EXISTS solution, which gives you a sequential scan with minimal search over the index of table B. SELECT * FROM A WHERE NOT EXISTS ( SELECT * FROM B WHERE B.user_id = A.user_id ); By the way, if you have any specific criteria on A, besides the NOT EXISTS or NOT IN, they may cause an index scan on A as well. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
В списке pgsql-sql по дате отправления: